Plasma proteome of growing tumors

Early detection of cancer is vital for the best chance of successful treatment, but half of all cancers are diagnosed at an advanced stage. A simple and reliable blood screening test applied routinely would therefore address a major unmet medical need. To gain insight into the value of protein biomarkers in early detection and stratification of cancer we determined the time course of changes in the plasma proteome of mice carrying transplanted human lung, breast, colon, or ovarian tumors. For protein measurements we used an aptamer-based assay which simultaneously measures ~ 5000 proteins. Along with tumor lineage-specific biomarkers, we also found 15 markers shared among all cancer types that included the energy metabolism enzymes glyceraldehyde-3-phosphate dehydrogenase, glucose-6-phophate isomerase and dihydrolipoyl dehydrogenase as well as several important biomarkers for maintaining protein, lipid, nucleotide, or carbohydrate balance such as tryptophanyl t-RNA synthetase and nucleoside diphosphate kinase. Using significantly altered proteins in the tumor bearing mice, we developed models to stratify tumor types and to estimate the minimum detectable tumor volume. Finally, we identified significantly enriched common and unique biological pathways among the eight tumor cell lines tested.


Results
Proteomic analysis of NSCLC tumor-bearing mice. Erlotinib-sensitive (H1650) or erlotinib-resistant (H1975) NSCLC cells were implanted subcutaneously on study day 0 into the hind flank of female NCr mice and tumors were monitored until they reached a maximal size of 2 g (Fig. 1a). We analyzed protein levels in plasma samples collected from 5-6 individual animals on study days 0 (prior to implantation), 9, 19, 30, 40 and 44 for H1650 tumors and study days 0, 9, 19, 27 and 30 for H1975 tumors. Sampling time differences reflect an adjustment for differences in tumor growth rate. We identified 473 proteins in the slower-growing H1650 tumors and 1,345 proteins in the faster-growing H1975 tumors (Supplementary Table S1) that were statistically significant (fdr corrected p-value ≤ 0.05). Of these, 248 in H1650 and 159 in H1975 exhibited a fold-change from baseline ≥|2| (Fig. 1b, c; Supplementary Table S2). The distribution of changes was markedly skewed toward increased levels, as expected in mice with metabolically active tumors (Fig. 1b, c for H1650; Supplementary  Fig. S1 for H1975). In comparison, 37 proteins changed significantly with time in control animals with no tumor cells implanted (Supplementary Table S3). After removing those proteins with significant change in the control group, we identified 98 statistically significant proteins in common between the two NSCLC xenograft models (Supplementary Table S4 and Fig. 2a). We also identified proteins with significantly increased expression unique to either the H1650 (150 proteins) or to the H1975 tumors (62 proteins) (Supplementary Table S5; Fig. 2b, c). To establish whether these biomarkers are expressed by H1650 and H1975 cells in culture, we measured the proteome changes of cell-conditioned media and cell lysates over several days ( Fig. 2; Supplementary Tables S6 and  S7). A substantial fraction of the significantly changed common plasma proteins could also be detected above background levels in either supernatants or cell lysates (98 and 99 percent in lysates; 63 and 73 percent in media of H1650 and H1975 lines, respectively), indicating that these biomarkers could be produced by the implanted tumors.
Tumor detection from protein signatures of NSCLC tumors. We used machine learning to build a model to differentiate tumor-bearing from non-tumor-bearing mice and to predict tumor volume. Using the common set of 98 statistically significant proteins shared between the H1650 and H1975 models (Supplementary Table S4) we trained an elastic net linear regression model, regressing on the tumor volume. This led to the selection of a 19-analyte model (common lung model) for tumor volume prediction (Supplementary Table S8). A training/testing split of 65/34 samples was used to train the model and evaluate model performance. There was strong correlation between actual versus predicted tumor volume with an explained variance (R 2 ) for the training and testing sets of 0.90 and 0.87, respectively (Supplementary Table S9 and Fig. 3a). A degree of homoskedasticity in prediction (that is, increased scatter) is observed with larger tumor volumes which is expected when saturation of analyte signals occurs resulting in less discriminatory information at larger tumor volumes (Fig. 2).
The ability to differentiate tumor-bearing from non-tumor-bearing mice was evaluated by scanning the empirically predicted tumor volume on the holdout test set to define sensitivity/specificity metrics. Non-tumor bearing samples were derived from all time points of the control group and from study day 0 samples of the other groups (n = 16). Tumor volume prediction of non-tumor-bearing mice ideally should be centered at 0 mm 3 ; however, the variability around the prediction for these mice may infer a lower limit for the minimum detectable tumor volume (MDTV) required for classification. For the common lung model, predicted tumor volumes < 0.75 mm 3 , an empirically predicted value of one of the samples, would classify the mouse as a non-tumor-bearing animal.  (9, 19, 30 and 40) relative to Study Day 0, for H1650 implanted mice (top panels) or for non-implanted (control) animals (bottom panels). Circles indicate individual analytes and vertical lines indicate a Log 2 fold-change of |1|. Significant analytes (fdr corrected p-value ≤ 0.05 with median Log 2 fold-change ≥|1|) are indicated by red circles. Volcano plots for H1975 tumors are shown in Supplementary Fig. S1. (c) Heatmap representing the fold changes of statistically significant analytes (as defined above). Individual animal Log 2 fold-changes were calculated relative to the median values on Study Day 0 of control mice. For each Study Day indicated there are multiple rows, each representing an individual animal while each column represents a different protein analyte. Top panels show H1650 implanted mice and bottom panels show un-implanted (control) mice. Heatmap for H1975 tumors is shown in Supplementary Fig. S1 www.nature.com/scientificreports/ www.nature.com/scientificreports/ If an animal was predicted to have a tumor size > 0.75 mm 3 we would classify it as tumor-bearing mouse with a 94% specificity and 100% sensitivity (Supplementary Table S9).
To determine if tumor-prediction models are better using cell line-specific protein biomarkers, we repeated the analysis with biomarkers specific to either H1650-(150 proteins) or H1975-derived (62 proteins) tumors (Supplementary Table S5; Fig. 3b, c). To develop the H1650 model we used the same data set, elastic net method and training/testing split as above with the addition of manually imputing the H1975 tumor volumes to zero. This resulted in the selection of 28 analytes for H1650-derived tumor volume predictions (Supplementary Table S10). A similar method was used to select 3 analytes for H1975-derived tumor volume predictions (Supplementary Table S10). The performance of the cell line specific models was similar to the common-lung model with a training(testing) R 2 of 0.94(0.86) and 0.62(0.81) for the H1650 and H1975 models, respectively (Supplementary Table S9). If an animal was predicted to have a tumor size > 0.22 mm 3 and > 0.36 mm 3 for H1650 and H1975 models, respectively, we would classify it as a tumor-bearing mouse (with a 96% specificity and 100% sensitivity).
We also assessed the ability of the common lung model to predict tumor volumes after erlotinib treatment for both erlotinib resistant H1975-derived tumors and for erlotinib sensitive H1650-derived tumors. Separate groups of mice bearing these tumors were administered a 100 mg/kg oral dose of erlotinib once daily for 14 consecutive days (n = 6/group). Treatment with erlotinib was initiated when tumors were approximately 500 mm 3 . As expected, tumor volumes decreased after drug treatment for H1650-derived tumors but not for H1975-derived tumors (Fig. 3d). The common lung tumor volume prediction model continued to perform well, accurately predicting volumes after erlotinib treatment for both tumor types (Fig. 3e). Compared to vehicle alone, signal changes were attenuated for many markers in the model for the erlotinib sensitive tumor (SER-PINA3, β1,4-galactosyltransferase 1(B4GalT1), cystatin M, dihydrolipoamide dehydrogenase, mitochondrial (DLD), fibronectin fragment 3 (FN1.3), interleukin-15 receptor subunit alpha (IL-15 Ra), interleukin-8 (IL-8), kazal-type serine protease inhibitor domain-containing protein 1 (KAZD1), nucleoside-diphosphate kinase B (NDPK), plexin domain-containing protein 1 (PXDC1), protein S100-A6 (S100A6), and tenascin), but not for the resistant tumor ( Supplementary Fig. S2).  Table S11 for timepoints, Supplementary Tables S12-S14 with data for all analytes; Supplementary Tables S15-S17, ranked by fold-change). As with NSCLC tumors, we observed both unique and shared biomarkers for cell lines of the same tumor type. Some of the shared analytes were also present in other cancer types (Fig. 4b, c, Supplementary Tables S18-S21); others were unique for each cancer type (Supplementary  Tables S22-S25). Using biomarkers shared among both cell lines of each tumor type led to prediction models of tumor presence for tumor volumes of > 0.03 mm 3 for ovarian, 3.4 mm 3 for breast and 15 mm 3 for colon tumors with a 94% specificity and 100% sensitivity, and good concordance of actual versus predicted tumor volume in the training and testing datasets (Supplementary Tables S18-S20, S26, Supplementary Fig. S3).

Tumor detection from plasma protein signatures of colon, breast, and ovarian tumors.
Tumor detection using plasma biomarkers shared among all tumor types. We identified 15 plasma biomarkers shared among all four tumor types and all eight cell lines (Supplementary Table S27, Fig. 4b,  c and Supplementary Table S6). Using these biomarkers, we applied an elastic net linear regression model as described above to predict tumor volume using a training/testing split of 285/107 samples. There was good concordance of actual versus predicted tumor volume in the training and testing datasets of this pan cancer prediction model (Fig. 5a). If an animal was predicted to have a tumor volume > 3.9 mm 3 it would be categorized as tumor bearing with 99% sensitivity and 97% specificity, a performance similar to that obtained with the tumor type specific models (Supplementary Table S28, Fig. 5a).
This pan-cancer model also demonstrated the ability to predict tumor response to treatment. When animals bearing the erlotinib-sensitive H1650-derived lung tumor were treated with erlotinib, signals of all 15 of these biomarkers declined relative to the signals from animals bearing the same tumor but administered vehicle alone (Fig. 6). In contrast, there were no prominent differences in signal readout for these 15 biomarkers between the erlotinib treated and vehicle control groups for animals bearing the erlotinib-resistant H1975-derived tumor.
Differentiation of tumor types. Although the previously described methods give an accurate prediction of tumor volume, they do not differentiate between the type of tumor. Towards this end, we used the unique biomarkers from each cancer type (Supplementary Tables S22-S25) and a multinomial logistic regression model across all cancer types and controls (Fig. 5b, Supplementary Table S29). The resulting model was highly sensitive (≥ 0.95) for selecting the correct tumor tissue (origin) type with mixed performance in specificity depending on the tumor type. The overall accuracy of selecting the correct tumor and differentiating non-tumor was 91%.
Individual analytes that emerge in plasma at the earliest stages of tumor growth. The changes in circulating proteins between the initial implantation day and the first day of observable tumor growth are massive, often spanning several orders of magnitude. Consequently, multivariate models can robustly differentiate tumor-bearing from non-tumor-bearing mice. Nevertheless, one could ask which individual proteins can be detected early, during the initial stages of tumor growth, for each tumor type, based on the rate of increase  www.nature.com/scientificreports/ in plasma as a function of tumor volume. To answer this question, we selected proteins with the largest foldchanges that were statistically significant between pre-implantation and the first blood collection after implantation (Supplementary Table S11). Exponential growth curves were fit to each analyte signal with respect to tumor volume and estimates of the 95% confidence interval (CI) for the fit parameters were used to estimate a 'lower limit of detection' for tumor volume ( Fig. 7 and Supplementary Table S30 for representative biomarkers  and Supplementary Table S31 for the complete list). Several of the proteins whose level changes early for some tumor types (glyceraldehyde-3-phosphate dehydrogenase (GAPDH), B4GalT1, death receptor 6 (DR6, also known as tumor necrosis factor receptor superfamily member 21  www.nature.com/scientificreports/ the MDTV estimates cover a range of tens to hundreds of cubic millimeters for different tumor types (Supplementary Table S31).
Pathway analyses. Pathway analysis was performed to identify significantly enriched GO biological processes and pathways across the eight xenograft models using significant analytes (fdr corrected p-value ≤ 0.05 and fold-change from baseline ≥|2|) (Supplementary Tables S2, S15-S17). The number and type of enriched Enriched pathways for the ovarian xenografts could be grouped into several clusters, with small molecule metabolism related pathways being especially well represented followed by translational regulation, RNA processing and immune response pathways. Pathways detected in breast xenografts were also enriched for small molecule www.nature.com/scientificreports/ metabolism, as well as those involving proteolysis and the immune response. Proteolysis and cell migration/ adhesion and extracellular matrix pathways were enriched in lung xenografts. Finally, colon xenografts had the fewest number of enriched pathways discovered and were mostly present in translational regulation and RNA processing clusters. The small molecule metabolism pathway was the only one to show significant enrichment in all four tissue types while the immune response and proteolysis pathways showed significant enrichment in 3 of the 4 tissue types. Additional functional pathways were detected in varying xenograft models. One of the striking features of each of the four tumor types is a clear pathway signature (Fig. 8), illustrating differences in the relative importance of distinct pathways for tumor growth.

Discussion
There is a well-known direct correlation between tumor burden and metastatic risk 14 , tumor burden and resistance to chemotherapy 15 , and an inverse correlation between tumor burden and curability 16 . The importance of detecting tumors at early stages has therefore remained high, with much room for improvement over existing methods. Among other molecules, solid tumors (including tumor cells and associated stroma) produce proteins which end up in systemic circulation. For detection of such proteins in blood, a major difficulty is related to the need to detect a small volume of a malignancy remotely, after massive dilution of tumor-associated analytes in the total circulating volume of blood. In this study, we identify sentinel proteins in plasma for four different tumor types, each represented by two cell lines.
In both NSCLC cell lines, we were able to identify many highly statistically significant protein changes in plasma during the subcutaneous growth of transplanted tumors, with a strong bias toward increased protein levels. These changes included protein biomarkers shared between H1650 and H1975 tumors, as well as those unique to each tumor subtype. As tumors grow in the subcutaneous environment, changes in protein composition Nodes represent enriched gene sets, which are then clustered with related gene sets according to their gene content. Enrichment results were mapped as a network of gene sets (nodes). Node size is proportional to the total number of genes within each gene set. The proportion of shared genes between gene sets is represented as the thickness of the edge connecting nodes. The network map was manually curated by assigning functional categories to each cluster and by removing singleton gene sets. A complete list of enriched gene sets can be found in Supplementary Table S32, and each panel shows enriched gene sets for the 2 cell line xenografts for a given tissue type. www.nature.com/scientificreports/ represent the cumulative contribution of proteins secreted by tumor cells, stromal cells, and to a lesser extent, proteins secreted from other organs, in response to the acute phase of tumor growth. To assess whether the tumor cells used for the xenograft studies could produce the proteins identified in plasma, we investigated the proteome that these cells produce in cell culture (both secreted proteins in the cell culture supernatant, as well as intracellular proteins from cell lysates). Most of the statistically significant common analytes identified in plasma of animals bearing H1650-or H1975-derived tumors were produced by these cells in culture. Nevertheless, it remains to be established whether the origin of these proteins are the tumor cells themselves or the host response to the tumors.
Predicting the presence of a tumor based on simultaneous measurement of multiple plasma proteins is possible using either common or unique biomarkers. Using the common biomarkers in a 19-analyte lung model, we could distinguish human lung tumor-bearing from non-tumor-bearing mice for tumors > 0.75 mm 3 . This model was also able to predict tumor volumes for both tumor cell lines before and after erlotinib treatment suggesting the possibility of developing more generalized clinically useful predictions of tumor burden from a simple blood test. Using biomarkers unique to each of the tumors resulted in prediction of tumors at a somewhat earlier, but overall comparable stage (> 0.22 and > 0.36 mm 3 , respectively).
How do these estimates compare to human tumor sizes? Accounting for the differences in blood volume between a mouse and a human, we estimate that the common lung model could detect spherical lung tumors around 1.7-2.0 cm in diameter (or about the size of an olive) while the cell-type specific models would do a bit better (1.1-1.3 cm). We get similar estimates when we use tumor weight to body weight ratios: 1.5-1.8 cm in diameter for common model and 0.9-1.2 cm for cell-type-specific models. These estimates are roughly equivalent to the minimum median detection estimates for circulating tumor DNA (ctDNA) of 2.0-2.3 cm 17 . Both estimates are better than the 3.5 cm current median detection size for lung tumors from the 2005 to 2015 SEER database as discussed in Avanzini et al. 17 .
We then extended these analyses to subcutaneous growth of cell lines representing ovarian, colon and breast tumors. As with NSCLC tumors, massive changes in the plasma proteome were detected with both shared and unique proteins readily identifiable in all the pairs of cell lines. Models analogous to those described for NSCLC tumors allowed the identification of tumors in the very small size range for ovarian tumors (> 0.03 mm 3 ), and larger tumors for breast (> 3.4 mm 3 ) and colon (> 15 mm 3 ). Using the same adjustment for blood volume differences, this is equivalent to 0.6, 2.8 and 4.6 cm in diameter human tumors-a substantial range from very small to medium size tumors.
Using biomarkers shared among all tumor types we developed a pan cancer prediction model that can detect the presence of any of these tumors at volumes of > 3.9 mm 3 . Many of these non-tumor differentiating biomarkers were among the group with the largest early changes. Tumor type can subsequently be established using proteins unique to each tumor. These observations demonstrate that a general-to-specific tumor type identification approach is feasible, without a substantial compromise in the ability to detect tumors at early stages.
For tumor growth and invasion, cancer cells evolve toward acquiring several capabilities: maintenance of proliferative state, avoidance of apoptosis, increase in angiogenesis, activation of invasion/metastasis, continuous replication, and avoidance of tumor suppressors 18,19 . These malignant cells reprogram their metabolic pathways to sustain their need for energy and building blocks for tumor growth. Indeed, of the 15 proteins in the pan cancer model (Supplementary Table S27), three (GAPDH, phosphoglucose isomerase (PGI), and DLD) are important for energy metabolism and seven (ASL, WRS, NDPK, QSCN6, B4GalT1, lysophosphatidic acid phosphatase type 6 (ACP6), and BTD) are important for maintaining protein, lipid, nucleotide, or carbohydrate balance (Supplementary Table S33). However, many of these proteins (GAPDH, PGI, DLD, ASL, WRS, and NDPK) are multifunctional, or so called moonlighting proteins, involved in a growing list biological pathways including tumorigenic pathways [20][21][22][23][24][25][26][27][28][29] and several shown to have altered expression levels in some human cancers [30][31][32][33][34][35][36][37][38] . An example of note is WRS which catalyzes the loading of tryptophan onto its cognate tRNA. Since tryptophan is the least abundant of the proteinogenic amino acids, it may play a rate-limiting role during protein synthesis, so the increase in WRS in tumor cells could globally accelerate protein synthesis 39 . Depletion of tryptophan in tumors results in post-transcriptional tryptophan-to-phenylalanine substitution in proteins through codon reassignment 40 . Interferon gamma can upregulate the synthesis and secretion of WRS from endothelial cells, fibroblasts, and macrophages where it interacts with TLR2 and/or TLR4 on macrophages stimulating the innate immune response 29,41 . WRS also has roles in the extracellular domain: following cleavage by plasmin and/or elastases, extracellular WRS acts on endothelial cells to inhibit angiogenesis via interaction with VE-cadherin 29 .
Significantly enriched pathways in the xenograft models varied by tissue type, and to a lesser extent by the specific cell line used to construct the xenograft (Fig. 8). Enriched pathways fell into one of several functional clusters, including small molecule metabolism, translational regulation, RNA processing, immune response, proteolysis, cell migration or adhesion and extracellular matrix. Additional signaling pathways were also detected. Although we found some functional clusters specific to a given tissue, for example extracellular matrix and cell migration/adhesion pathways were lung specific, most pathway groups were found in more than one tissue type. These results also suggest a concerted biological response shared across different xenograft systems.
Our study has several limitations. Since this was a longitudinal mouse study, plasma volumes were small and insufficient for orthogonal validation of these biomarkers using the samples from this study. However, the aptamers used for measuring 188 of the 236 analytes that are significantly different with at least a twofold change in our study have had at least one previously performed orthogonal confirmation of their binding specificity (see Supplementary Table S21). For example, all 15 biomarkers that are shared among all cancer types in our study, are responsive to mutations in the genes that encode them (that is, all have exhibited cis pQTL associations). In addition, 13 of these 15 biomarkers have also been confirmed by mass spectrometry (enrichment of a cognate protein from a biological sample, either human plasma, serum, urine, or cell culture) as well as other orthogonal www.nature.com/scientificreports/ specificity confirmations (Supplementary Table S21). We also do not have information about the impact of posttranslational modifications on the binding affinity of SOMAmer reagents used in the SomaScan assay. This study identifies early sentinel proteins of tumor presence and estimates the MDTV that can also be stratified using protein biomarkers in blood. Since each tumor arises from clonal expansion of cells within an organ or tissue but with diverse paths to transformation, we observe both common and unique protein signatures, within as well as across tumor types. In this context, however, it is also important to note other limitations of this study: these models use established cell lines that are not grown in their native microenvironment which could have an impact on both the tumor cell expression pattern and on the expression pattern of the surrounding normal tissue. Perhaps even more importantly, the tumor cells used for implantation in this study are in a more advanced state of selection/evolution at the time they were collected from human patients than would be typically present in the earliest true cancerous cell population or in a precancerous cell population with changes indicating a likely trajectory to cancer 2 . At this time, we do not have appropriate matching human blood samples from early-stage malignancies for comparison to the mouse data reported in this study. Nonetheless, our study provides important insights into the use of protein biomarkers for early detection, progression, and stratification of cancer, including treatment based on the changing proteome in xenografts transplanted with patient-derived tumors.

Methods
All methods are reported in accordance with the ARRIVE guidelines. Animal procedures. All animal procedures were approved by the Institutional Animal Care and Use Committee at Inotiv-Boulder, Inc. (Boulder, CO) and conducted in accordance with all state and federal guidelines. Female NCRNU mice (5-6 weeks old) were obtained from Envigo and acclimatized for seven days (18-26 °C, a relative humidity 30-70%;12-h light/dark cycle). Food and water were provided ad libitum. Animals were randomly assigned to groups by body weight and tumors were established by injecting 0.5-1 × 10 7 cells per animal in a single subcutaneous site in the flank (5-12/group; Supplementary Table S11). Lung tumors were treated with erlotinib (100 mg/kg) or vehicle control (6% Captisol in water) administered orally once daily for 14 consecutive days beginning on study day 14 for H1975 tumors and on study day 23 for H1650 tumors. Dose volumes were calculated using the most recent body weight. Animals were monitored daily for morbidity/mortality and body weights and tumor volumes measured twice weekly. Volumes were calculated using the formula: length × (width) 2 /2. At end of study, euthanasia was performed according to the AVMA guidelines for the euthanasia of animals. Anesthesia was first induced by 3% isoflurane followed by exsanguination and pneumothorax.

Reagents.
Plasma and tumor collection. Ethylenediaminetetraacetic acid (EDTA) blood samples were collected from orbital sinus bleeds from anesthetized mice. Plasma was obtained by centrifugation at 3600×g for 10 min at 4 °C then frozen (− 80 °C). On the last day (Supplementary Table S11) animals were sacrificed, tumors excised, weighed, and frozen in liquid nitrogen. Table S11) were plated in media containing 10% FBS and placed in a humidified 37 °C incubator (5% CO 2 ). Media was aspirated daily and control media (growth media with 0.1% FBS and 0.1% DMSO) added to each well. After a further 24-h incubation, the cell-conditioned media was collected (Day 1) from 3 individual wells and each centrifuged for 5 min at 16,000×g (4 °C). Supernatants were stored at − 80 °C. Unconditioned media was used as the day 0 control. For cell lysates, media was aspirated, cells washed twice with ice-cold PBS. M-PER lysis buffer (250 µL) containing HALT was added for 2 min at 4 °C while gently rotating the sample. Cells were removed by scrapping and debris pelleted by centrifugation (5 min at 16,000×g, 4 °C. Supernatants were stored at − 80 °C. To remaining wells, growth media with 0.1% FBS and 0.1% DMSO was added, and the cycle repeated for the Day 2 and 3 collections.

H1650 and H1975 cell culture. Cells (1 × 10 5 cells) (Supplementary
Tumor lysates. Frozen tumors (10-20 mg) were pulverized (BioSpec Products 59012N Tissue Pulverizer; # UX-36903-00) and thawed on ice. Twenty volumes of T-Per with HALT were added and samples homogenized (DMK Life Sciences Tissue Grinder; #K749540-0000) for 5 min (4 °C). Samples were centrifuged and supernatants stored as described above. www.nature.com/scientificreports/ Proteomic analyses. Plasma proteomic analyses were conducted using using a 3-dilution (5%, 2%, and 0.05%) SomaScan assay generally as described 6,42 . Briefly, 11 µL of each sample was mixed with 2.5 µL of 27 mM EDTA followed by the addition of 132 µL of sample diluent and 74.5 µL of assay buffer. The resulting mixture, containing 5% plasma, was added to the first assay plate (100 µL). The remainder was diluted serially to 2% and 0.5% plasma using assay buffer and 100 µL of each added to the second and third assay plates, respectively. Cell and tumor lysate samples were run in a single dilution assay at 20 µg/mL protein in assay buffer containing 1 × HALT and 0.5 µM Z-block using the same assay version but with Agilent slides with fewer anti-aptamer probes. Protein concentrations were determined by the micro-BCA method. Cell conditioned media samples were run in a single dilution assay after diluting fivefold into assay buffer containing 1 × HALT and 0.5 µM Z-block. The SOMAscan assay (v3) used in this study is based upon a mixture of slow off-rate modified aptamers (SOMAmer ® reagents) targeting about 5,000 human proteins. Each reagent contains a 5′-fluorophore (cyanine 3), a biotin, and a photocleavable linker. This mixture was immobilized on streptavidin-coated beads and incubated for 2 h at 28 °C with sample concentrations and buffers as described above such that each SOMAmer reagent was at ~ 0.5 nM. After washing with assay buffer, bound proteins were biotinylated and the SOMAmer reagents released from the beads using ultraviolet light. A kinetic challenge was applied by adding a polyanionic competitor to the mixture to enrich for specific protein:SOMAmer complexes which have slower off-rates than any non-specifically bound SOMAmer reagents. Those complexes that survived kinetic challenge and free proteins, were then captured on streptavidin beads. After further washing, protein bound SOMAmer reagents were eluted from the beads and quantified by hybridization to a DNA array (Agilent) containing an anti-SOMAmer DNA probe for each SOMAmer reagent in the original mixture. Target protein concentration in the sample is proportional to the fluorescent intensity of each feature of the array as measured via the cyanine 3 dye. All samples were randomized on the assay plates and assay operators were blinded to the study groups. Information on assay cross reactivity to mouse proteins is given in the Supplementary Note and Supplementary Fig. S5).
Data pre-processing. Variation in hybridization efficiency was normalized using the median fluorescent ratio of a set of 12 independent hybridization control oligonucleotides added at fixed concentrations that span the dynamic range of the Agilent reader. Plate to plate variations were corrected using 5 mouse EDTA-plasma calibrator samples per plate. Calibration scale factor ratios were calculated from the median signal for each SOMAmer reagent from the mouse calibrators to a fixed reference and applied to all mouse samples on the plate. Variability in overall sample intensity was corrected on a sample-by-sample basis, and by dilution group, by calculating the median ratio of each analyte relative to the intra-plate median and using the median value of those ratios as a scale factor applied to the individual sample.

Statistical analyes.
Statistical analyses were carried out in R (3.5.2). Assay signals were Log 10 transformed prior to inference testing. Analyst was not blinded to the group identities.
Plasma data. For each cell line and analyte, repeated measures analysis of variance (RMNOVA) was used first to identify statistically significant changes with respect to study days using each animal as the repeated measure. p-values were adjusted for multiple comparisons using the Benjamini-Hochberg technique. Analytes with an adjusted p-value ≤ 0.05 and a median Log2-fold change ≥|1| relative to the signal on study day 0 were preliminarily selected for inclusion into models. For further pruning to prevent false positives, we evaluated Pearson's correlation between tumor volume and analyte signal. Analytes were removed from inclusion if Pearson's correlation coefficient was <|0.25|.
Conditioned media and lysates. ANOVA was used to identify statistically significant changes of analyte signal with respect to study days for each lung cell line (3/line/time-point). p-values were adjusted for multiple comparisons as above and median Log 2 -fold change determined relative to day 0.
Tumor lysates. Differential expression for each analyte between excised lung cancer tissues was evaluated by Student's unpaired t-tests (n = 3/tumor). Analytes with fdr corrected p-value ≤ 0.05 and above a median Log 2 -fold change ≥|1| are reported.
Tumor size and classification prediction models. Elastic net regression models (linear and logistic) were trained using 'Caret' (6.0.86) 43 in R allowing the mixing parameters, and the regularization parameter, to grid-search over [0, 1]. Prior to training for tumor size prediction, the tumor volume was transformed with a cubic root and raw RFU values Log 10 transformed, centered, and scaled. A 20% hold out set was used to evaluate model performance metrics. MDTV predictions were estimated by nonlinear modeling of the signal of an analyte with respect to tumor volume defined by: Parameters A, B, and C were estimated through nonlinear least squares R package stats 3.6.3. MDTV was estimated by first establishing 95% CIs around the signal of non-tumor bearing mice for each cell line. RFU variance, with respect to tumor volume, was assumed to be constant and MDTV calculated by inverting the previous equation and determining the volume at which the lower 95% CI the absence of any tumor intersected the upper 95% CI. www.nature.com/scientificreports/ Pathway analyses. Analysis of enriched pathways was performed for the eight xenograft models. Differentially expressed proteins (fdr q ≤ 0.05 and Log 2 -fold-change >|2|), excluding analytes which changed in control mice, were used as input for the g:profiler pathway enrichment method using gprofiler2.v0.2.1 in R 44,45 . Annotated gene sets representing Gene Ontology biological processes, Hallmark pathways, C6 Oncogenic signature gene sets and Canonical Pathways (incl. Biocarta, KEGG, PID, Reactome and Wikipathways) were downloaded from the Molecular Signatures Database v7.5.1 for use as target pathways 46 . Due to the semi-targeted nature of the SomaScan protein quantification method a unique list of all proteins interrogated on the platform (n = 4132) were used as a statistical null background. An fdr threshold of q < 0.25 was used to select enriched gene sets for further analysis. The Cytoscape Enrichment Map plugin v3.3.4 47 was used for clustering and visualization of significantly enriched gene sets detected for each xenograft model. A Jaccard plus overlap combined coefficient of 0.375 was used for defining Default cluster edges (default setting). Singleton gene sets were removed from visualization if a single cell line model contributed to enrichment and the gene set was not connected to a larger functional cluster.

Data availability
All summary data are included in this manuscript (and its Supplementary Information files). All raw proteomics data will be made available from the corresponding author on reasonable request.